A High-Speed Document Image Classifier

نویسنده

Lejun Shao

چکیده

In this paper, a high-speed document image classification algorithm is presented. The algorithm is based on the bottom-up strategy which can successfully segment and classify any type of sophisticated layout documents without the limitation of Manhattan rules. Special techniques are employed to overcome the slow speed problem facing most of the bottomup algorithms. During segmentation process, the algorithm used a byte-based operation to establish a list of connected components within a scan line, and to merge touching connected components in adjacent scan lines. The texture features of each connected component are also obtained during the process. This greatly reduced the computation time spent on classification stage. For a typical A4 size document image with 2OODPI resolution, the average processing time on a Pentium based IBM-PC computer is 1.2 second.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Printed Document Analysis and Page Segmentation

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Detection of Text with Connected Component Clustering

Text detection and recognition is a hot topic for researchers in the field of image processing. It gives attention to Content based Image Retrieval (CBIR) community in order to fill the semantic gap between low level and high level features. Several methods have been developed for text detection and extraction that achieve reasonable accuracy for natural scene text (camera images) as well as mu...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

A High-Speed Document Image Classifier

نویسنده

چکیده

منابع مشابه

Persian Printed Document Analysis and Page Segmentation

Document Analysis And Classification Based On Passing Window

Detection of Text with Connected Component Clustering

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

عنوان ژورنال:

اشتراک گذاری